Time-Varying Gaussian Process Bandit Optimization

نویسندگان

  • Ilija Bogunovic
  • Jonathan Scarlett
  • Volkan Cevher
چکیده

We consider the sequential Bayesian op-timization problem with bandit feedback,adopting a formulation that allows for the re-ward function to vary with time. We modelthe reward function using a Gaussian pro-cess whose evolution obeys a simple Markovmodel. We introduce two natural extensionsof the classical Gaussian process upper confi-dence bound (GP-UCB) algorithm. The first,R-GP-UCB, resets GP-UCB at regular in-tervals. The second, TV-GP-UCB, insteadforgets about old data in a smooth fashion.Our main contribution comprises of novel re-gret bounds for these algorithms, providingan explicit characterization of the trade-offbetween the time horizon and the rate atwhich the function varies. We illustrate theperformance of the algorithms on both syn-thetic and real data, and we find the gradualforgetting of TV-GP-UCB to perform favor-ably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms signifi-cantly outperform classical GP-UCB, since ittreats stale and fresh data equally.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Material for “ Time - Varying Gaussian Process Bandit Optimization

t (x)2, as was to be shown. B Learning ✏ via Maximum-Likelihood In this section, we provide an overview of how ✏ can be learned from training data in a principled manner; the details can be found in [20, Section 4.3] and [6, Section 5]. Throughout this appendix, we assume that the kernel matrix is parametrized by a set of hyperparameters ✓ (e.g., ✓ = (⌫, l) for the Mátern kernel), and ✏. Let ȳ ...

متن کامل

On 2-armed Gaussian Bandits and Optimization

We explore the 2-armed bandit with Gaussian payoos as a theoretical model for optimization. We formulate the problem from a Bayesian perspective, and provide the optimal strategy for both 1 and 2 pulls. We present regions of parameter space where a greedy strategy is provably optimal. We also compare the greedy and optimal strategies to a genetic-algorithm-based strategy. In doing so we correct...

متن کامل

TIME-VARYING FUZZY SETS BASED ON A GAUSSIAN MEMBERSHIP FUNCTIONS FOR DEVELOPING FUZZY CONTROLLER

The paper presents a novel type of fuzzy sets, called time-Varying Fuzzy Sets (VFS). These fuzzy sets are based on the Gaussian membership functions, they are depended on the error and they are characterized by the displacement of the kernels to both right and left side of the universe of discourse, the two extremes kernels of the universe are fixed for all time. In this work we focus only on t...

متن کامل

Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization

In this paper, we consider the problem of sequentially optimizing a black-box function f based on noisy samples and bandit feedback. We assume that f is smooth in the sense of having a bounded norm in some reproducing kernel Hilbert space (RKHS), yielding a commonly-considered non-Bayesian form of Gaussian process bandit optimization. We provide algorithm-independent lower bounds on the simple ...

متن کامل

Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design

Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multiarmed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016